126 research outputs found
Asymptotic Theory for Random Forests
Random forests have proven to be reliable predictive algorithms in many
application areas. Not much is known, however, about the statistical properties
of random forests. Several authors have established conditions under which
their predictions are consistent, but these results do not provide practical
estimates of random forest errors. In this paper, we analyze a random forest
model based on subsampling, and show that random forest predictions are
asymptotically normal provided that the subsample size s scales as s(n)/n =
o(log(n)^{-d}), where n is the number of training examples and d is the number
of features. Moreover, we show that the asymptotic variance can consistently be
estimated using an infinitesimal jackknife for bagged ensembles recently
proposed by Efron (2014). In other words, our results let us both characterize
and estimate the error-distribution of random forest predictions, thus taking a
step towards making random forests tools for statistical inference instead of
just black-box predictive algorithms.Comment: This manuscript is superseded by "Estimation and Inference of
Heterogeneous Treatment Effects using Random Forests" by Wager and Athey
(arXiv:1510.04342). The new paper extends the asymptotic theory developed
here, and applies it to causal inference in the potential outcomes framework
with unconfoundedness. The present version is maintained online for archival
purposes onl
The Efficiency of Density Deconvolution
The density deconvolution problem involves recovering a target density g from
a sample that has been corrupted by noise. From the perspective of Le Cam's
local asymptotic normality theory, we show that non-parametric density
deconvolution with Gaussian noise behaves similarly to a low-dimensional
parametric problem that can easily be solved by maximum likelihood. This
framework allows us to give a simple account of the statistical efficiency of
density deconvolution and to concisely describe the effect of Gaussian noise on
our ability to estimate g, all while relying on classical maximum likelihood
theory instead of the kernel estimators typically used to study density
deconvolution
Subsampling Extremes: From Block Maxima to Smooth Tail Estimation
We study a new estimator for the tail index of a distribution in the Frechet
domain of attraction that arises naturally by computing subsample maxima. This
estimator is equivalent to taking a U-statistic over a Hill estimator with two
order statistics. The estimator presents multiple advantages over the Hill
estimator. In particular, it has asymptotically smooth sample paths as a
function of the threshold k, making it considerably more stable than the Hill
estimator. The estimator also admits a simple and intuitive threshold selection
rule that does not require fitting a second-order model. Journal of
Multivariate Analysis, 130, 2014Comment: Added reference
Quasi-Oracle Estimation of Heterogeneous Treatment Effects
Flexible estimation of heterogeneous treatment effects lies at the heart of
many statistical challenges, such as personalized medicine and optimal resource
allocation. In this paper, we develop a general class of two-step algorithms
for heterogeneous treatment effect estimation in observational studies. We
first estimate marginal effects and treatment propensities in order to form an
objective function that isolates the causal component of the signal. Then, we
optimize this data-adaptive objective function. Our approach has several
advantages over existing methods. From a practical perspective, our method is
flexible and easy to use: In both steps, we can use any loss-minimization
method, e.g., penalized regression, deep neural networks, or boosting;
moreover, these methods can be fine-tuned by cross validation. Meanwhile, in
the case of penalized kernel regression, we show that our method has a
quasi-oracle property: Even if the pilot estimates for marginal effects and
treatment propensities are not particularly accurate, we achieve the same error
bounds as an oracle who has a priori knowledge of these two nuisance
components. We implement variants of our approach based on penalized
regression, kernel ridge regression, and boosting in a variety of simulation
setups, and find promising performance relative to existing baselines.Comment: Biometrika, forthcomin
Experimenting in Equilibrium
Classical approaches to experimental design assume that intervening on one
unit does not affect other units. There are many important settings, however,
where this non-interference assumption does not hold, as when running
experiments on supply-side incentives on a ride-sharing platform or subsidies
in an energy marketplace. In this paper, we introduce a new approach to
experimental design in large-scale stochastic systems with considerable
cross-unit interference, under an assumption that the interference is
structured enough that it can be captured via mean-field modeling. Our approach
enables us to accurately estimate the effect of small changes to system
parameters by combining unobstrusive randomization with lightweight modeling,
all while remaining in equilibrium. We can then use these estimates to optimize
the system by gradient descent. Concretely, we focus on the problem of a
platform that seeks to optimize supply-side payments p in a centralized
marketplace where different suppliers interact via their effects on the overall
supply-demand equilibrium, and show that our approach enables the platform to
optimize p in large systems using vanishingly small perturbations.Comment: Forthcoming in Management Scienc
Adaptive Concentration of Regression Trees, with Application to Random Forests
We study the convergence of the predictive surface of regression trees and
forests. To support our analysis we introduce a notion of adaptive
concentration for regression trees. This approach breaks tree training into a
model selection phase in which we pick the tree splits, followed by a model
fitting phase where we find the best regression model consistent with these
splits. We then show that the fitted regression tree concentrates around the
optimal predictor with the same splits: as d and n get large, the discrepancy
is with high probability bounded on the order of sqrt(log(d) log(n)/k)
uniformly over the whole regression surface, where d is the dimension of the
feature space, n is the number of training examples, and k is the minimum leaf
size for each tree. We also provide rate-matching lower bounds for this
adaptive concentration statement. From a practical perspective, our result
enables us to prove consistency results for adaptively grown forests in high
dimensions, and to carry out valid post-selection inference in the sense of
Berk et al. [2013] for subgroups defined by tree leaves
Confidence Intervals for Nonparametric Empirical Bayes Analysis
In an empirical Bayes analysis, we use data from repeated sampling to imitate
inferences made by an oracle Bayesian with extensive knowledge of the
data-generating distribution. Existing results provide a comprehensive
characterization of when and why empirical Bayes point estimates accurately
recover oracle Bayes behavior. In this paper, we develop flexible and practical
confidence intervals that provide asymptotic frequentist coverage of empirical
Bayes estimands, such as the posterior mean or the local false sign rate. The
coverage statements hold even when the estimands are only partially identified
or when empirical Bayes point estimates converge very slowly
High-Dimensional Asymptotics of Prediction: Ridge Regression and Classification
We provide a unified analysis of the predictive risk of ridge regression and
regularized discriminant analysis in a dense random effects model. We work in a
high-dimensional asymptotic regime where and , and allow for arbitrary covariance among the features. For
both methods, we provide an explicit and efficiently computable expression for
the limiting predictive risk, which depends only on the spectrum of the
feature-covariance matrix, the signal strength, and the aspect ratio .
Especially in the case of regularized discriminant analysis, we find that
predictive accuracy has a nuanced dependence on the eigenvalue distribution of
the covariance matrix, suggesting that analyses based on the operator norm of
the covariance matrix may not be sharp. Our results also uncover several
qualitative insights about both methods: for example, with ridge regression,
there is an exact inverse relation between the limiting predictive risk and the
limiting estimation risk given a fixed signal strength. Our analysis builds on
recent advances in random matrix theory.Comment: Added a section on prediction versus estimation for ridge regression.
Rewrote introduction. Other results unchange
Semiparametric Exponential Families for Heavy-Tailed Data
We propose a semiparametric method for fitting the tail of a heavy-tailed
population given a relatively small sample from that population and a larger
sample from a related background population. We model the tail of the small
sample as an exponential tilt of the better-observed large-sample tail, using a
robust sufficient statistic motivated by extreme value theory. In particular,
our method induces an estimator of the small-population mean, and we give
theoretical and empirical evidence that this estimator outperforms methods that
do not use the background sample. We demonstrate substantial efficiency gains
over competing methods in simulation and on data from a large controlled
experiment conducted by Facebook.Comment: To appear in Biometrik
Estimation and Inference of Heterogeneous Treatment Effects using Random Forests
Many scientific and engineering challenges -- ranging from personalized
medicine to customized marketing recommendations -- require an understanding of
treatment effect heterogeneity. In this paper, we develop a non-parametric
causal forest for estimating heterogeneous treatment effects that extends
Breiman's widely used random forest algorithm. In the potential outcomes
framework with unconfoundedness, we show that causal forests are pointwise
consistent for the true treatment effect, and have an asymptotically Gaussian
and centered sampling distribution. We also discuss a practical method for
constructing asymptotic confidence intervals for the true treatment effect that
are centered at the causal forest estimates. Our theoretical results rely on a
generic Gaussian theory for a large family of random forest algorithms. To our
knowledge, this is the first set of results that allows any type of random
forest, including classification and regression forests, to be used for
provably valid statistical inference. In experiments, we find causal forests to
be substantially more powerful than classical methods based on nearest-neighbor
matching, especially in the presence of irrelevant covariates.Comment: To appear in the Journal of the American Statistical Association.
Part of the results developed in this paper were made available as an earlier
technical report "Asymptotic Theory for Random Forests", available at
(arXiv:1405.0352
- β¦